Load package:
library(marmap)
Import bathymetry data from NOAA:
b = getNOAA.bathy(lon1 = 115, lon2 = 140, lat1 = 35, lat2 = 20, resolution = 1)
Import dataframe with the latitude and longitude of sampling points (in decimal degrees):
pts1 <- read.csv("~/desktop/MiraiDPI/SamplingLatLong.csv")
pts1$Station<- as.character(pts1$Station)
pts_photos <- read.csv("~/desktop/MiraiDPI/SamplingLatLong2.csv")
pts_photos$Station<- as.character(pts_photos$Station)
pts_acanths <- read.csv("~/desktop/MiraiDPI/SamplingLatLong3.csv")
pts_acanths$Station<- as.character(pts_acanths$Station)
Make dataframe with map labels and their lat/long:
Clon = c(127.5,131.25, 121.6, 121.03, 120.8, 126)
Clat= c(26, 31.93, 31.1, 23, 27.2, 22)
CLabels= c("Okinawa", "Japan", "China", "Taiwan", "East China Sea", "Philippine Sea")
Countries = cbind.data.frame(Clon, Clat, CLabels)
str(Countries)
Make plot one layer at a time:
#define plot colors
blues <- c("lightsteelblue4", "lightsteelblue3", "lightsteelblue2", "lightsteelblue1")
greys <- c(grey(0.6), grey(0.93), grey(0.99))
#tiff("~/desktop/MiraiDPI/MiraiFiltersMap.tiff", height= 8, width =8, units = 'in', res=300) # uncomment to save plot to file
plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = list(c(0, max(b), greys), c(min(b),0 , blues)), cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5) #bathymetry colors
plot(b, lwd = 0.8, deep = 0, shallow = 0, step = 0, add = TRUE) # highlight coastline with solid black line
plot(b, deep=-200, shallow=-200, lwd = 0.4, drawlabels=T, add=T) # Add -200m isobath
plot(b, deep=-1000, shallow=-1000, lwd = 0.4, drawlabels=T, add=T) # Add -1000m isobath
plot(b, deep=-2000, shallow=-2000, lwd = 0.4, drawlabels=T, add=T) # Add -2000m isobath
plot(b, deep=-3000, shallow=-3000, lwd = 0.4, drawlabels=T, add=T) # Add -3000m isobath
points(pts1, pch = 21, bg = 'white', col = 'red') # Add sampling points
points(pts_photos, pch = 19, col = 'red') #Add points for stations with photo profiles
text(pts1, labels=pts1$Station, adj= 1.5, cex=1.2) # Add sampling station numbers
text(pts_photos, labels=pts_photos$Station, adj= 1.5, cex=1.2) # add photo station numbers
text(Countries, labels=Countries$CLabels, cex= 1.7, adj = -0.2) # Add map labels
#dev.off() #uncomment to save plot to file
Sample collection, DNA extraction, PCR, & sequencing
Seawater collected from the Subsurface Chlorophyll Maximum (SCM), mid-water column, and near-bottom was collected in 10 L Niskin bottles at each sampling station and surface seawater was collected by bucket. Five Liter replicates from SCM, mid, and near-bottom water samples and 4.5 L replicates from surface samples were sequentially filtered through 10.0 and 0.2 µm pore-size Polytetrafluoroethylene (PTFE) filters. Filters were flash frozen in liquid nitrogen and stored at -80 ºC until DNA extraction. DNA was extracted following the manufacturers protocols for the Qiagen/MoBio DNeasy PowerWater Kit, including the optional heating step to lyse cells. Amplicon PCR and sequencing library preparation followed the procedures described in the Illumina 16S Metagenomic Sequencing Library Preparation manual modified only to include universal eukaryotic primers for the v4 region of the 18S rRNA gene (F: Stoeck et al. 2010, R: Brisbin et al. 2018) and amplicon PCR conditions most appropriate for these primers. The 220 samples were radomly assigned to 4 sequencing runs on the Illumina MiSeq sequencing platform with v3 chemistry for 300x300-bp paired-end sequencing.
Sequences are available from the NCBI Sequencing Read Archive (SRA) with accession PRJNA546472.
Bioinformatic processing with Qiime2
Demultiplexed paired-end sequences provided by the OIST sequencing center were imported to Qiime2 (v2018.11) software (Bolyen et al. 2018). The Divisive Amplicon Denoising Algorithm (DADA) was implemented with the DADA2 plug-in for Qiime2 for quality filtering, chimera removal, and feature table construction (Callahan et al. 2015) separately for each sequencing run before merging the four resulting feature tables. Taxonomy was assigned to Amplicon Sequence Variants (ASVs) in the feature table by training a classifier on the Protist Ribosomal Reference (PR2) database to classify ASVs (Guillo et al. 2012). The feature table and assigned taxonomy were then exported from Qiime2 for further analysis.
Below is an example Qiime2 workflow for 1 sequencing run:
Activate an anaconda environment for qiime2: source activate qiime2-2018.11
Import sequencing data (all sequences for eacg sequencing run must be in their own directory): qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ./seqs/seqrun1/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path run1_demux-paired-end.qza
Visualize squencing results (use these graphs to decide what length to trim sequences) qiime demux summarize --i-data run1_demux-paired-end.qza --o-visualization run1_demux.qzv
Run the DADA2 algorithm: qiime dada2 denoise-paired --i-demultiplexed-seqs run1_demux-paired-end.qza --output-dir ./dd2run1 --o-representative-sequences run1_rep-seqs --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 260 --p-trunc-len-r 250 --p-n-threads 3
Check results: qiime feature-table summarize --i-table ./dd2run1/table.qza --o-visualization ./dd2run1/table.qzv --m-sample-metadata-file ~/desktop/MiraiFilters/sampleMaptxt.txt
qiime metadata tabulate --m-input-file dd2run1/denoising_stats.qza --o-visualization dd2run1/denoising_stats.qzv
Merge feature tables from multiple sequencing runs: qiime feature-table merge --i-tables ./dd2run1/table.qza --i-tables ./dd2run2/table.qza --i-tables ./dd2run3/table.qza --i-tables ./dd2run4/table.qza --o-merged-table mergedtable.qza
qiime feature-table merge-seqs --i-data run1_rep-seqs.qza --i-data run2_rep-seqs.qza --i-data run3_rep-seqs.qza --i-data run4_rep-seqs.qza --o-merged-data merged-rep-seqs.qza
Import taxonomy database: qiime tools import --type 'FeatureData[Sequence]' --input-path pr2_version_4.11.1_mothur.fasta --output-path pr2.qza
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path pr2_version_4.11.1_mothur.tax -output-path pr2_tax.qza
Select v4 region from PR2 sequences: qiime feature-classifier extract-reads --i-sequences pr2.qza --p-f-primer CCAGCASCYGCGGTAATTCC --p-r-primer ACTTTCGTTCTTGATYRA --o-reads pr2_v4.qza
Train the classifier: qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads pr2_v4.qza --i-reference-taxonomy pr2_tax.qza --o-classifier pr2_v4_classifier.qza
Classify: qiime feature-classifier classify-sklearn --i-classifier pr2_v4_classifier.qza --i-reads merged-rep-seqs.qza --o-classification taxonomy.qza
qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv
Export the .tsv taxonomy file from taxonomy.qzv for use in following steps.
Load packages:
library(tidyverse)
#devtools::install_github("jbisanz/qiime2R")
library(qiime2R)
library('phyloseq')
library('vegan')
library('ggplot2')
library(RColorBrewer)
library(DESeq2)
library(reshape)
library("wesanderson")
library("gridExtra")
library("ggpubr", lib.loc="~/Library/R/3.5/library")
library(Hmisc)
Set default colors:
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
P <- brewer.pal(12, "Paired")
S2 <- brewer.pal(8, "Set2")
S1 <- brewer.pal(8, "Set1")
D2 <- brewer.pal(8, "Dark2")
colors<- c(P, S2, S1,D2)
colors2 <- rep(colors, 6)
ap<- c("#F1B6A1", "#D4A52A", "#E3E5DB", "#A5CFCC", "#0E899F", "#A83860", "#ED91BC", "#DB5339", "#F58851", "#42465C", "#1E479A", "#F7CDA4", "#CF529C", "#11638C")
Dj<- wes_palette("Darjeeling2")
Cv <- wes_palette("Cavalcanti1")
Gb<- wes_palette("GrandBudapest1")
wpcolor<- c(Dj[2], Dj[4], Cv[2], Cv[4], Gb[2], Cv[5])
#function for drawing legends separately from plots
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
Convert Qiime2 artifacts to phyloseq objects:
phyloseq<-qza_to_phyloseq(features="~/desktop/MiraiFilters/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2 2018.4,
## if data is not successfully imported, please report here github.com/jbisanz/
## qiime2R/issues
Import sample data:
metatable <- read.csv("~/desktop/MiraiFilters/SampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$Station <- as.character(metatable$Station)
metatable$filter <-as.factor(metatable$filter)
META<- sample_data(metatable)
str(META)
## 'data.frame': 212 obs. of 32 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
## ..@ .Data :List of 32
## .. ..$ : Factor w/ 212 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
## .. ..$ : Factor w/ 212 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
## .. ..$ : chr "2" "2" "2" "2" ...
## .. ..$ : Factor w/ 8 levels "1","31","32",..: 7 7 8 8 1 1 4 4 6 6 ...
## .. ..$ : Factor w/ 14 levels "C02","C03","C04",..: 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ : Factor w/ 4 levels "Bottom","Mid",..: 4 4 4 4 1 1 2 2 3 3 ...
## .. ..$ : Factor w/ 5 levels "Bottom","Deep",..: 5 5 5 5 1 1 3 3 4 4 ...
## .. ..$ : Factor w/ 2 levels "D","S": 2 2 2 2 1 1 1 1 2 2 ...
## .. ..$ : Factor w/ 109 levels "10Bott1","10Bott2",..: 71 71 72 72 65 65 67 67 69 69 ...
## .. ..$ : int 0 0 0 0 1000 1000 700 700 92 92 ...
## .. ..$ : Factor w/ 2 levels "2","10": 2 1 2 1 2 1 2 1 2 1 ...
## .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
## .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ : Factor w/ 3 levels "n","p","y": 3 3 3 3 3 3 3 3 3 3 ...
## .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
## .. ..$ : num 0 0 0 0 1056 ...
## .. ..$ : num 25.25 25.25 25.25 25.25 4.46 ...
## .. ..$ : num 34.7 34.7 34.7 34.7 34.4 ...
## .. ..$ : num 202.2 202.2 202.2 202.2 69.8 ...
## .. ..$ : num 23.1 23.1 23.1 23.1 27.3 ...
## .. ..$ : num 0.0698 0.0698 0.0698 0.0698 0.051 ...
## .. ..$ : num 0.061 0.061 0.061 0.061 1.107 ...
## .. ..$ : num 0.521 0.521 0.521 0.521 11.128 ...
## .. ..$ : num 0.06 0.06 0.06 0.06 37.51 ...
## .. ..$ : num 0.01 0.01 0.01 0.01 0.02 0.02 0 0 0.11 0.11 ...
## .. ..$ : num 1.05 1.05 1.05 1.05 111.04 ...
## .. ..$ : num 0.033 0.033 0.033 0.033 2.666 ...
## .. ..$ : num 0.05 0.05 0.05 0.05 0.03 0.03 0.03 0.03 0.04 0.04 ...
## .. ..$ : num 26.3 26.3 26.3 26.3 26.3 ...
## .. ..$ : num 126 126 126 126 126 ...
## ..@ names : chr "X" "SampleID" "Station" "Bottle" ...
## ..@ row.names: chr "A8-F17-DNA" "B8-F18-DNA" "C8-F19-DNA" "D8-F20-DNA" ...
## ..@ .S3Class : chr "data.frame"
Load PR2 taxonomy:
taxonomy <- read.csv("~/desktop/MiraiFilters/taxonomy.csv", stringsAsFactors = FALSE)
names(taxonomy) <- c("row", "tax", "Confidence")
row.names(taxonomy) <-taxonomy[[1]]
taxonomy <- taxonomy[,(-1)]
taxonomy <- separate(taxonomy, tax, c("D0","D1", "D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "D10", "D11", "D12", "D13", "D14"), sep = ";", fill = "right")
taxonomy <- taxonomy[,c(1:8)]
taxmat <- as.matrix(taxonomy)
TAX = tax_table(taxmat)
str(TAX)
## Formal class 'taxonomyTable' [package "phyloseq"] with 1 slot
## ..@ .Data: chr [1:23115, 1:8] "Eukaryota" "Eukaryota" "Eukaryota" "Eukaryota" ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:23115] "000a220fe8edd771dd82f6915627ef3e" "000c79760dd16d6692018deb8ea1b164" "000d9eaa21eaff08f34fb1a1a87601d5" "000e6f109e01181b2b1eda5d6c99319c" ...
## .. .. ..$ : chr [1:8] "D0" "D1" "D2" "D3" ...
Add taxonomy to phyloseq object:
ps = merge_phyloseq(phyloseq, TAX, META)
Some basic statistics:
#make ASV dataframe
OTUs <- data.frame(otu_table(ps))
Total number of ASVs in the data set:
OTUsRS<- OTUs
OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 22656
Total Number of Sequences:
sum(OTUsRS$RowSum)
## [1] 16340248
merge ASV table with taxonomy:
otusWtax <- merge(OTUs, taxonomy, by = 'row.names')
keep only Acantharian ASVs:
otusA <- filter(otusWtax, D3 == "Acantharea")
Number of acantharian ASVs in the data set:
OTUsRS<- otusA[,-c(1,213:220)]
OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 1053
Acantharian ASV Richness:
OTUs0 <- OTUsRS!=0 #is this number not a zero? true (1) or false (0)
csums <- colSums(OTUs0) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1906
## [1] 1053
min(csumdf$csums) #49
## [1] 2
mean(csumdf$csums) #730
## [1] 37.84434
how many acantharian seqs total:
csums <- colSums(OTUsRS) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
sum(csumdf$csums)
## [1] 1099858
remove mislabeled samples
mixedup <- c("F43", "F44", "F45", "F46", "F47", "F48", "F84")
'%ni%' <- Negate('%in%')
ps<- subset_samples(ps, SampleID %ni% mixedup)
remove metazoan ASVs and seq not classified at the kindom level
ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")
library("CoDaSeq")
OTU4clr<- data.frame(t(data.frame(otu_table(ps))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)
metatableCLR <-metatable
row.names(metatableCLR) <- gsub("-", "", row.names(metatableCLR))
META2<- sample_data(metatable)
psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p_full<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c( "0.2", "10.0")) +
geom_point(size=3) +theme(text = element_text(size=16)) + ggtitle("A. Full Protist Communities")
p_full
There is clustering by sample depth: DCM, Surface, and Mid/Bottom. In the DCM and Surface, there is also clustering by filter type.
to better observe effect of filter pore-size
Surface
physeqSurf<- subset_samples(psCLR, Depth == "Surface")
OTUsSurf <- otu_table(physeqSurf)
metaSurf <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSurf),]
adonis2(vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 3112 0.04279 1.2516 0.203
## Residual 28 69632 0.95721
## Total 29 72745 1.00000
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf_full<-plot_ordination(physeqSurf, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) #+ geom_vline(xintercept=0, linetype="dashed", color = "black")
pSurf_full
SCM
physeqSCM<- subset_samples(psCLR, Depth == "SCM")
OTUsSCM <- otu_table(physeqSCM)
metaSCM <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSCM),]
adonis2(vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 2630 0.01457 0.7096 0.91
## Residual 48 177939 0.98543
## Total 49 180569 1.00000
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM_full<-plot_ordination(physeqSCM, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))# + geom_vline(xintercept=0, linetype="dashed", color = "black")
pSCM_full
Mid
physeqMid<- subset_samples(psCLR, Depth == "Mid")
OTUsMid <- otu_table(physeqMid)
metaMid <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsMid),]
adonis2(vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 2418 0.02254 1.1067 0.246
## Residual 48 104884 0.97746
## Total 49 107302 1.00000
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid_full<-plot_ordination(physeqMid, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) #+ geom_hline(yintercept=0, linetype="dashed", color = "black")
pMid_full
Bottom
physeqBot<- subset_samples(psCLR, Depth == "Bottom")
OTUsBot <- otu_table(physeqBot)
metaBot <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsBot),]
adonis2(vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 2255 0.01729 0.8972 0.687
## Residual 51 128207 0.98271
## Total 52 130462 1.00000
ordu = ordinate(physeqBot, "PCoA", "euclidean")
pBot_full<-plot_ordination(physeqBot, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))
pBot_full
supp1<-ggarrange(pSurf_full, pSCM_full, pMid_full, pBot_full, common.legend = TRUE, legend = "none")
#supp1
library("CoDaSeq")
psAcan <- subset_taxa(ps, D3 =="Acantharea")
OTU4clr<- data.frame(t(data.frame(otu_table(psAcan))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)
metatableCLR <-metatable
row.names(metatableCLR) <- gsub("-", "", row.names(metatableCLR))
META2<- sample_data(metatable)
psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p_Acanth<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c( "0.2", "10.0")) +
geom_point(size=3) +theme(text = element_text(size=16)) + ggtitle("B. Acantharian Communities")
p_Acanth
There is clustering by sample depth: SCM, Surface, and Mid/Bottom.
Surface
physeqSurf<- subset_samples(psCLR, Depth == "Surface")
OTUsSurf <- otu_table(physeqSurf)
metaSurf <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSurf),]
adonis2(vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 34.85 0.03285 0.9511 0.479
## Residual 28 1025.93 0.96715
## Total 29 1060.78 1.00000
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf_As<-plot_ordination(physeqSurf, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))
pSurf_As
SCM
physeqSCM<- subset_samples(psCLR, Depth == "SCM")
OTUsSCM <- otu_table(physeqSCM)
metaSCM <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSCM),]
adonis2(vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 88.2 0.01092 0.5301 0.989
## Residual 48 7986.6 0.98908
## Total 49 8074.8 1.00000
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM_As<-plot_ordination(physeqSCM, ordu, color ="filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))
pSCM_As
Mid
physeqMid<- subset_samples(psCLR, Depth == "Mid")
OTUsMid <- otu_table(physeqMid)
metaMid <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsMid),]
adonis2(vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 138.1 0.01857 0.9084 0.494
## Residual 48 7295.9 0.98143
## Total 49 7434.0 1.00000
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid_As<-plot_ordination(physeqMid, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))
pMid_As
Bottom
physeqBot<- subset_samples(psCLR, Depth == "Bottom")
OTUsBot <- otu_table(physeqBot)
metaBot <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsBot),]
adonis2(vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Df SumOfSqs R2 F Pr(>F)
## filter 1 136.9 0.01535 0.795 0.749
## Residual 51 8780.4 0.98465
## Total 52 8917.2 1.00000
ordu = ordinate(physeqBot, "PCoA", "euclidean")
pBot_As<-plot_ordination(physeqBot, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))
pBot_As
supp_As<-ggarrange(pSurf_As, pSCM_As, pMid_As, pBot_As, common.legend = TRUE, legend = "none")
#supp_As
supp2<-ggarrange(p_full, p_Acanth, common.legend = TRUE, legend = "bottom")
#supp2
Merge Samples –> collapse replicates
Prepare metadata table:
metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]
META<- sample_data(metatable)
ps<- subset_samples(ps, SampleID %ni% mixedup)
ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")
ps2<- ps
sample_data(ps2) <- META
Merge:
mergedps <- merge_samples(ps2, "desc")
## Warning in asMethod(object): NAs introduced by coercion
## Warning in asMethod(object): NAs introduced by coercion
print(mergedps)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 19574 taxa and 112 samples ]
## sample_data() Sample Data: [ 112 samples by 32 sample variables ]
## tax_table() Taxonomy Table: [ 19574 taxa by 8 taxonomic ranks ]
number of samples reduced from 211 to 112
Fix meta table in phyloseq object:
meta2<-as.data.frame(sample_data(mergedps))
split<- do.call(rbind, strsplit(row.names(meta2), "-"))
meta2$Depth <- split[,2]
meta2$filter<-split[,3]
meta2$desc <- row.names(meta2)
meta2$Station <- as.character(meta2$Station)
meta2$Depth_f <- factor(meta2$Depth, levels=c('Surface','SCM','Mid','Bottom'))
meta2$Station_f <- factor(meta2$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
META <-sample_data(meta2)
sample_data(mergedps)<-META
Create phyloseq object with acantharian ASVs only:
phA<-subset_taxa(mergedps, D3 =="Acantharea")
phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))
print(phAra)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 1053 taxa and 112 samples ]
## sample_data() Sample Data: [ 112 samples by 34 sample variables ]
## tax_table() Taxonomy Table: [ 1053 taxa by 8 taxonomic ranks ]
Relative abundance plot:
taxabarplot<-plot_bar(phAra, x= "Station_f", fill = "D4", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=wpcolor) + theme(legend.title=element_blank()) + geom_bar(aes( fill=D4), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")
t2<-taxabarplot + theme(legend.position="bottom")
t2
#ggsave("/Users/brisbin/Desktop/MiraiDPI/Acantharea_TaxaBarPlot.pdf", width = 12, height =7)
The relative abundance plot shows that the two filters for each sample are similar at all four water depths. This is different than the pattern seen for the full community, where the samples clustered by filter type in the surface and SCM (and are visibly different in relative abundance plots).
The surface samples are dominated by photosymbiotic acantharian sequences (Arth/Symph).
The SCM sees the addition of Acantharea-Group-II and Group VI and slightly more Chaunacanthida.
Mid and Bottom samples are dominated by Group-I, but still has the others as well, including photosymbiotic clades.
Change order of clades in plot and group Acantharea_X with NA
a_ra <- psmelt(phAra)
a_ra$D4<- as.character(a_ra$D4)
a_ra$D4[is.na(a_ra$D4)] <- "Acantharea_X"
wpcolor<- c(Dj[2], Dj[4], Cv[2], Cv[4], Cv[5], Gb[2])
a_ra$D4<- factor(a_ra$D4, levels= c("Acantharea_X", "Acantharea-Group-I", "Acantharea-Group-II" , "Acantharea-Group-VI" , "Chaunacanthida", "Arthracanthida-Symphyacanthida" ))
a_ra$Station <- factor(a_ra$Station, levels=c("11", "10", "3", "9","4", "8", "2", "5", "12", "14", "15", "13", "17", "18"))
t2<-ggplot(a_ra, aes(x=Station, y=Abundance, fill=D4, )) + geom_bar(stat = "identity") +facet_grid(filter~Depth_f)+ scale_y_continuous(expand = c(0, 0)) +theme_classic() + scale_fill_manual(values=wpcolor) +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")
t2<-t2 + theme(legend.position="bottom")
t2
#ggsave("/Users/brisbin/Desktop/MiraiDPI/Acantharea_TaxaBarPlot_RO.pdf", width = 11, height =6)
Take a closer look at the Chaunacanthida & Arthracanthida/Symphycanthida relative abundance comapared to eachother
phA<-subset_taxa(mergedps, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida") )
phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))
a_ra <- psmelt(phAra)
t3<-plot_bar(phAra, x= "Station_f", fill = "D4", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=c(wpcolor[6],wpcolor[5])) + theme(legend.title=element_blank()) + geom_bar(aes( fill=D4), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")
t3<-t3 + theme(legend.position="bottom")
t3
a 10.0 filter and a 0.2 filter in mid-water has only Chaunacanthida, but station 3 only has arth/symph in 0.2 filter from bottom water
overall - does not seem to be a pattern as to whether C or A/S is more abundant in the deeper samples
Relative abundance of Arthracanthida/Symphyacanthida seqs
phA<-subset_taxa(mergedps, D4 == "Arthracanthida-Symphyacanthida" )
oOTUs <- data.frame(otu_table(phA))
phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))
a_ra <- psmelt(phAra)
t3<-plot_bar(phAra, x= "Station_f", fill = "D6", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + theme(legend.title=element_blank()) + geom_bar(aes( fill=D6), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")
t3<-t3 + theme(legend.position="bottom")
t3
## Warning: Removed 448 rows containing missing values (position_stack).
## Warning: Removed 448 rows containing missing values (position_stack).
Really interesting pattern in the sequences classified to genus level:
Acanthostaurs seems to be more abundant in surface and SCM, but present primarily in smaller fractions in deeper samples
Phyllostaurus seems to be only present in deeper samples, on both filters, but mostly on the larger filter.
Most of the ASVs are not classified to genus level (yellow and grey)
Merge by depth: Merge:
mergedpsD <- merge_samples(phA, "Depth")
## Warning in asMethod(object): NAs introduced by coercion
## Warning in asMethod(object): NAs introduced by coercion
print(mergedpsD)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 224 taxa and 4 samples ]
## sample_data() Sample Data: [ 4 samples by 34 sample variables ]
## tax_table() Taxonomy Table: [ 224 taxa by 8 taxonomic ranks ]
number of samples reduced from 211 to 4
a_ra <- psmelt(mergedpsD)
a_ra1<- a_ra %>% filter(Abundance > 0) %>% mutate(D6= na_if(D6, "Arthracanthida-Symphyacanthida_XX"))
a_ra1$D6 <- as.character(a_ra1$D6)
a_ra2 <- a_ra1 %>% replace_na(list(D6 = "XXXX")) %>% arrange(D6)
#reverse!
a_ra2$OTU <- factor(a_ra2$OTU, levels = rev(unique(a_ra2$OTU )))
#change x axis order
bubble_plot <- ggplot(a_ra2,aes(Sample, OTU)) +
geom_point(aes(size=Abundance, fill=D6),shape=21,color="black") +
theme(panel.grid.major=element_line(linetype=1,color="grey"),
axis.text.x=element_text(angle=90,hjust=1,vjust=0),
panel.background = element_blank()) +
ylab("ASVs") +
xlab("Depth") +
scale_fill_brewer(palette="Paired", name="Genus") +
scale_size(name = "Read\ncounts")+ theme(axis.text.y=element_blank())
bubble_plot
Yes! 32 out of 224 individual Arth/Symph ASVs are present in surface/SCM and mid/bottom.
The more abundant ASVs were often found in deep and shallow samples.
This makes sense … if they are more abundant, it is more likely that they are caught in multiple 5 L samples.
Looking at the graph - only Phyllostaurus seems to be predominantly in aphotic zone despite having high abundance (although 3 ASVs also in shallow samples)
unfiltered samples - percent acantharean in each 10 um filter - plot as points by depth
phyloseq<-qza_to_phyloseq(features="~/desktop/MiraiFilters/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2 2018.4,
## if data is not successfully imported, please report here github.com/jbisanz/
## qiime2R/issues
metatable <- read.csv("~/desktop/MiraiFilters/sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$Station <- as.character(metatable$Station)
metatable$filter <- as.character(metatable$filter)
META_all<- sample_data(metatable)
ps = merge_phyloseq(phyloseq, TAX, META_all)
ps<- subset_samples(ps, SampleID %ni% mixedup)
ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")
psD3 = tax_glom(ps, "D3")
psD3ra <- transform_sample_counts(psD3, function(OTU) 100* OTU/sum(OTU))
psD3ra_Aonly<- subset_taxa(psD3ra, D3 =="Acantharea")
psD3ra_Aonly_10<-subset_samples(psD3ra_Aonly, filter == "10")
psD3ra_Aonly_10_otus <-data.frame(t(data.frame(otu_table(psD3ra_Aonly_10))))
names(psD3ra_Aonly_10_otus) <- c("acanth")
metatable_percent_plot <- metatable
row.names(metatable_percent_plot) <- gsub("-", ".", row.names(metatable_percent_plot))
psD3ra_Aonly_10_otus_meta <- merge(psD3ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")
check to see if linear regression assumptions are met:
depthLM <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta)
par(mfrow = c(2, 2))
plot(depthLM)
Residuals are not normally distributed
Homoscedasticity doesn’t look terrible, but..
# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(depthLM)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 7.595509, Df = 1, p = 0.0058514
Not homoscedastic
–> use local curve fitting (loess)
percent_depth <- ggplot(psD3ra_Aonly_10_otus_meta, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=19)) + xlab("Depth (m)") +ylab("% Acantharian Sequences") +ggtitle("A.") +
geom_smooth(span =1, color = "#11638C") + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
percent_depth
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Correlation Coefficient
Check if data are normally distributed (if normally distributed, use parametric Pearson coefficient; if not, use non-paranetric Spearman coefficient)
ggdensity(psD3ra_Aonly_10_otus_meta$acanth,
main = "Density plot of acanth percentage",
xlab = "acanth percentage")
ggqqplot(psD3ra_Aonly_10_otus_meta$acanth)
shapiro.test(psD3ra_Aonly_10_otus_meta$acanth)
##
## Shapiro-Wilk normality test
##
## data: psD3ra_Aonly_10_otus_meta$acanth
## W = 0.88623, p-value = 1.422e-07
if p-value > 0.05 data are normally distributed
this p value indicates that the data distribution is significantly different than normal distribution
assumptions for Pearsons (parametric) - both values are normally distributed, linearity, and homoscedasticity, no outliers
Spearman does not require normally distributed data (non-parametric) –> Use Spearman
df_for_corr <- psD3ra_Aonly_10_otus_meta %>% dplyr::select(acanth, m)
rcorr(as.matrix(df_for_corr), type = "spearman")
## acanth m
## acanth 1.00 0.59
## m 0.59 1.00
##
## n= 108
##
##
## P
## acanth m
## acanth 0
## m 0
Percent contribution of acantharians to the full community is significantly positively correlated to depth (increased % with increased depth).
psD4 = tax_glom(ps, "D4")
psD4ra <- transform_sample_counts(psD4, function(OTU) 100* OTU/sum(OTU))
psD4ra_Aonly<- subset_taxa(psD4ra, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida"))
psD4ra_Aonly_10<-subset_samples(psD4ra_Aonly, filter == "10")
psD4ra_Aonly_10_otus <-data.frame(t(data.frame(otu_table(psD4ra_Aonly_10))))
psD4ra_Aonly_10_otus$acanth <- rowSums(psD4ra_Aonly_10_otus)
metatable_percent_plot <- metatable
row.names(metatable_percent_plot) <- gsub("-", ".", row.names(metatable_percent_plot))
psD4ra_Aonly_10_otus_meta <- merge(psD4ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")
check to see if linear regression assumptions are met:
depthLM <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta)
par(mfrow = c(2, 2))
plot(depthLM)
Residuals are not normally distributed
doesn’t look homoscedastic …
# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(depthLM)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 8.223355, Df = 1, p = 0.0041355
Not homoscedastic
–> use local curve fitting (loess)
percent_depth_photo <- ggplot(psD4ra_Aonly_10_otus_meta, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=19)) + xlab("Depth (m)") +ylab("% Arthracanthida-Symphyacanthida-Chaunacanthida") +ggtitle("B.") +
geom_smooth(span =1, color = Cv[5]) + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
percent_depth_photo
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Correlation Coefficient
Check if data are normally distributed (if normally distributed, use parametric Pearson coefficient; if not, use non-paranetric Spearman coefficient)
ggdensity(psD4ra_Aonly_10_otus_meta$acanth,
main = "Density plot of acanth percentage",
xlab = "acanth percentage")
ggqqplot(psD4ra_Aonly_10_otus_meta$acanth)
shapiro.test(psD4ra_Aonly_10_otus_meta$acanth)
##
## Shapiro-Wilk normality test
##
## data: psD4ra_Aonly_10_otus_meta$acanth
## W = 0.78083, p-value = 2.138e-11
NOT normal –> use Spearman correlation coefficient
dffor_corr <- psD4ra_Aonly_10_otus_meta %>% dplyr::select(acanth, m)
rcorr(as.matrix(dffor_corr), type = "spearman")
## acanth m
## acanth 1.00 -0.61
## m -0.61 1.00
##
## n= 108
##
##
## P
## acanth m
## acanth 0
## m 0
Percent contribution of photosymbiotic acantharians to the full community is significantly negatively correlated to depth (decreased % with increased depth).
ggarrange(percent_depth, percent_depth_photo)
#ggsave("loess_figs1.png", width = 14, height = 7)
Acantharian ROIs and full raw images are archived on Zenodo (doi: 10.5281/zenodo.3605400).
Load additional packages:
library(readr)
library(reshape2)
#library("plyr", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")
Station 17
Load and format CTD data:
ctd17 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0006/CTD/6KCDT0006_CTD_data_1sec_int.csv")
ctd17 <- ctd17[,1:4]
names(ctd17) <- c("Time", "Temp", "Salinity", "Depth")
ctd17$Depth <- ctd17$Depth * -1
ctd17$station <- 17
ctd17$Time <- gsub(":", "", ctd17$Time)
ctd17$Time2 <- paste0('201706090', ctd17$Time)
st17aCTD <- ctd17[,c(4,6)]
names(st17aCTD)<-c("Depth","photos")
str(st17aCTD)
## 'data.frame': 9576 obs. of 2 variables:
## $ Depth : num -12.6 -12.9 -13.1 -13.3 -13.3 ...
## $ photos: chr "20170609085210" "20170609085211" "20170609085212" "20170609085213" ...
Load and format image data:
st17as<- read.table("~/Desktop/MiraiDPI/st17DPI/st17rois.txt", header = FALSE)
names(st17as) <- c("photos")
st17as$photos <- strtrim(st17as$photos, 14)
st17as$photos <- as.numeric(st17as$photos)
## Warning: NAs introduced by coercion
Add depth from CTD data to images:
st17acanth <- merge(st17aCTD,st17as, by = "photos" )
Calculate the sampling volume (in order to get cells/L): Load names of all images from station 17 downcast:
st17pics<- read.table("~/desktop/MiraiDPI/st17DPI/st17ALLphotos.txt", header = TRUE)
st17pics$photos <- strtrim(st17pics$photos, 14)
st17pics$photos <- as.numeric(st17pics$photos)
## Warning: NAs introduced by coercion
st17pics <- merge(st17aCTD,st17pics, by = "photos" )
Get dataframe of the number of photos by depth at 10m intervals:
br = seq( -800, 0, by = 10)
ranges = br[-1]
freq = hist(st17pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
P<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P)
## 'data.frame': 80 obs. of 2 variables:
## $ depth : num -790 -780 -770 -760 -750 -740 -730 -720 -710 -700 ...
## $ photofrequency: int 0 0 215 26 26 25 26 26 26 7 ...
How many photos total?
sum(P$photofrequency)
## [1] 2453
Get same data frame for Acantharian frequency:
br = seq( -800, 0, by = 10)
ranges = br[-1]
freq = hist(st17acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
A<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A)
## 'data.frame': 80 obs. of 2 variables:
## $ depth : num -790 -780 -770 -760 -750 -740 -730 -720 -710 -700 ...
## $ Acanthfrequency: int 0 0 2 0 0 0 0 0 0 0 ...
How many acanths total?
sum(A$Acanthfrequency)
## [1] 237
Divide the acantharian frequency by photo frequency and convert to cells/L.
For stations 3 and 10 the camera and led were 15.4cm apart. For stations 15 and 17, the camera and LED were 14 cm apart.
5.5 x 4.6 x 15.4 = 389.62 cubic cm = 0.38962 L at stations 3 & 10
5.5 x 4.6 x 14 = 354.2 cubic cm = 0.3542 L at stations 15 & 17
station 17 volume calculation:
frequencyDF17 <- merge(A, P, by = "depth")[-1,]
frequencyDF17$conc <- frequencyDF17$Acanthfrequency / frequencyDF17$photofrequency
frequencyDF17$perLiter <- frequencyDF17$conc / 0.3542
frequencyDF17$Station <- "st.17"
frequencyDF17$pos_depth <- frequencyDF17$depth * -1
Station 3
Load and format CTD data:
ctd3 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0003/CTD/6KCDT0003_CTD_data_1sec_int.csv")
ctd3 <- ctd3[,1:4]
names(ctd3) <- c("Time", "Temp", "Salinity", "Depth")
ctd3$Depth <- ctd3$Depth * -1
ctd3$station <- 3
ctd3$Time <- gsub(":", "", ctd3$Time)
ctd3$Time2 <- paste0('201705310', ctd3$Time)
st3aCTD <- ctd3[,c(4,6)]
names(st3aCTD)<-c("Depth","photos")
str(st3aCTD)
## 'data.frame': 6584 obs. of 2 variables:
## $ Depth : num -7.39 -7.28 -7.21 -7.22 -7.26 ...
## $ photos: chr "20170531074319" "20170531074320" "20170531074321" "20170531074322" ...
Load and format image data:
st3as<- read.table("~/Desktop/MiraiDPI/st3DPI/st3rois.txt", header = FALSE)
names(st3as) <- c("photos")
st3as$photos <- strtrim(st3as$photos, 14)
st3as$photos <- as.numeric(st3as$photos)
## Warning: NAs introduced by coercion
st3acanth <- merge(st3aCTD,st3as, by = "photos" )
Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:
st3pics<- read.table("~/desktop/MiraiDPI/st3DPI/st3ALLphotos.txt", header = TRUE)
st3pics$photos <- strtrim(st3pics$photos, 14)
st3pics$photos <- as.numeric(st3pics$photos)
st3pics <- merge(st3aCTD,st3pics, by = "photos" )
st3pics <- st3pics[,-3]
names(st3pics)<- c("photos", "Depth")
st3pics$photos <- as.character(st3pics$photos)
Get data frame of image frequency by depth in 10m intervals:
br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq = hist(st3pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
P3<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P3)
## 'data.frame': 102 obs. of 2 variables:
## $ depth : num -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
## $ photofrequency: int 0 109 38 39 40 24 33 38 39 38 ...
How many photos total?
sum(P3$photofrequency)
## [1] 4010
Get same data frame for Acantharian frequency:
br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq = hist(st3acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
A3<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A3)
## 'data.frame': 102 obs. of 2 variables:
## $ depth : num -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
## $ Acanthfrequency: int 0 0 0 0 0 0 0 0 0 0 ...
How many acanths total?
sum(A3$Acanthfrequency)
## [1] 115
Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:
frequencyDF3 <- merge(A3, P3, by = "depth")[-1,]
frequencyDF3$conc <- frequencyDF3$Acanthfrequency / frequencyDF3$photofrequency
frequencyDF3$perLiter <- frequencyDF3$conc / 0.38962
frequencyDF3$pos_depth <- frequencyDF3$depth * -1
frequencyDF3$Station <- "st.3"
Station 10
Load and format CTD data:
ctd10 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0004/CTD/6KCDT0004_CTD_data_1sec_int.csv")
ctd10 <- ctd10[,1:4]
names(ctd10) <- c("Time", "Temp", "Salinity", "Depth")
ctd10$Depth <- ctd10$Depth * -1
ctd10$station <- 10
ctd10$Time <- gsub(":", "", ctd10$Time)
ctd10$Time2 <- paste0('201706020', ctd10$Time)
st10aCTD <- ctd10[,c(4,6)]
names(st10aCTD)<-c("Depth","photos")
Load and format image data:
st10aNames<- c("photos")
st10as<- read.table("~/Desktop/MiraiDPI/st10DPI/st10rois.txt", col.names = st10aNames)
st10as$photos <- strtrim(st10as$photos, 14)
st10acanth <- merge(st10aCTD,st10as, by = "photos" )
Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:
st10pics<- read.table("~/desktop/MiraiDPI/st10DPI/st10ALLphotos.txt", col.names = st10aNames)
st10pics$photos <- strtrim(st10pics$photos, 14)
st10pics$photos <- as.numeric(st10pics$photos)
st10pics <- merge(st10aCTD,st10pics, by = "photos" )
st10pics <- st10pics[,-3]
names(st10pics)<- c("photos", "Depth")
st10pics$photos <- as.character(st10pics$photos)
Get data frame of image frequency by depth in 10m intervals:
br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq = hist(st10pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
P10<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P10)
## 'data.frame': 102 obs. of 2 variables:
## $ depth : num -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
## $ photofrequency: int 0 6 51 33 32 33 32 32 33 33 ...
How many photos total?
sum(P10$photofrequency)
## [1] 3639
Get same data frame for Acantharian frequency:
br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq = hist(st10acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
A10<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A10)
## 'data.frame': 102 obs. of 2 variables:
## $ depth : num -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
## $ Acanthfrequency: int 0 0 1 0 0 0 0 0 0 4 ...
sum(A10$Acanthfrequency)
## [1] 359
Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:
frequencyDF10 <- merge(A10, P10, by = "depth")[-1,]
frequencyDF10$conc <- frequencyDF10$Acanthfrequency / frequencyDF10$photofrequency
frequencyDF10$perLiter <- frequencyDF10$conc / 0.38962
frequencyDF10$Station <- "st.10"
frequencyDF10$pos_depth <- frequencyDF10$depth * -1
Station 15
Load and format CTD data:
ctd15 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0005/CTD/6KCDT0005_CTD_data_1sec_int.csv")
ctd15 <- ctd15[,1:4]
names(ctd15) <- c("Time", "Temp", "Salinity", "Depth")
ctd15$Depth <- ctd15$Depth * -1
ctd15$station <- 10
ctd15$Time <- gsub(":", "", ctd15$Time)
ctd15$Time2 <- paste0('20170606', ctd15$Time)
st15aCTD <- ctd15[,c(4,6)]
names(st15aCTD)<-c("Depth","photos")
st15as<- read.table("~/Desktop/MiraiDPI/st15DPI/st15rois.txt", col.names = st10aNames)
st15as$photos <- strtrim(st15as$photos, 14)
st15acanth <- merge(st15aCTD,st15as, by = "photos" )
Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:
st15pics<- read.table("~/desktop/MiraiDPI/st15DPI/st15ALLphotos.txt", col.names = st10aNames)
st15pics$photos <- strtrim(st15pics$photos, 14)
st15pics$photos <- as.numeric(st15pics$photos)
## Warning: NAs introduced by coercion
st15pics <- merge(st15aCTD,st15pics, by = "photos" )
st15pics <- st15pics[,-3]
names(st15pics)<- c("photos", "Depth")
st15pics$photos <- as.character(st15pics$photos)
Get data frame of image frequency by depth in 10m intervals:
br = seq( -780, 0, by = 10)
ranges = br[-1]
freq = hist(st15pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
P15<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P15)
## 'data.frame': 78 obs. of 2 variables:
## $ depth : num -770 -760 -750 -740 -730 -720 -710 -700 -690 -680 ...
## $ photofrequency: int 38 50 49 45 42 39 42 41 39 37 ...
How many photos total?
sum(P15$photofrequency)
## [1] 3056
Get same data frame for Acantharian frequency:
br = seq( -780, 0, by = 10)
ranges = br[-1]
freq = hist(st15acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)
A15<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A15)
## 'data.frame': 78 obs. of 2 variables:
## $ depth : num -770 -760 -750 -740 -730 -720 -710 -700 -690 -680 ...
## $ Acanthfrequency: int 1 1 0 3 2 4 2 1 5 3 ...
sum(A15$Acanthfrequency)
## [1] 524
Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:
frequencyDF15 <- merge(A15, P15, by = "depth")[-1,]
frequencyDF15$conc <- frequencyDF15$Acanthfrequency / frequencyDF15$photofrequency
frequencyDF15$perLiter <- frequencyDF15$conc / 0.3542
frequencyDF15$pos_depth <- frequencyDF15$depth * -1
frequencyDF15$Station <- "st.15"
frequencyDF_all<- rbind(frequencyDF10, frequencyDF15, frequencyDF17, frequencyDF3)
frequencyDF_all$Station <- factor(frequencyDF_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))
Check assumptions for linear regression:
frequencyDF_all$logdepth = log(frequencyDF_all$pos_depth+1)
frequencyDF_all$logconc = log(frequencyDF_all$perLiter+1)
depth_conc_LM <- lm(logdepth ~ logconc, data = frequencyDF_all)
par(mfrow = c(2, 2))
plot(depth_conc_LM)
Residuals are normally distributed, but not homoscedastic:
# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(depth_conc_LM)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 81.69586, Df = 1, p = < 2.22e-16
Use Loess and correlation coefficient:
frequencyDF_all<- rbind(frequencyDF10, frequencyDF15, frequencyDF17, frequencyDF3)
frequencyDF_all$Station <- factor(frequencyDF_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))
conc_depth_all <- ggplot(frequencyDF_all, aes(x=pos_depth, y = perLiter)) + geom_point() + theme_bw() +theme(text = element_text(size=14)) + xlab("") +ylab("Imaged acantharian cells (per Liter)") +ggtitle("") + geom_smooth(color = "#0E899F")+ theme(axis.text.y = element_text(angle = 90, hjust = 1))+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
conc_depth_all
maximum concentration and depth at each station:
frequencyDF_all %>% dplyr::select(perLiter, depth, Station) %>% group_by(Station) %>% top_n(n=1, wt = perLiter) %>% arrange(Station)
## # A tibble: 4 x 3
## # Groups: Station [4]
## perLiter depth Station
## <dbl> <dbl> <fct>
## 1 1.81 -50 st.10
## 2 0.921 -30 st.3
## 3 4.71 0 st.15
## 4 2.82 0 st.17
Data are not normally distributed:
ggqqplot(frequencyDF_all$perLiter)
## Warning: Removed 2 rows containing non-finite values (stat_qq).
## Warning: Removed 2 rows containing non-finite values (stat_qq_line).
## Warning: Removed 2 rows containing non-finite values (stat_qq_line).
shapiro.test(frequencyDF_all$perLiter)
##
## Shapiro-Wilk normality test
##
## data: frequencyDF_all$perLiter
## W = 0.38755, p-value < 2.2e-16
Use Spearman Correlation Coefficient:
df_for_corr <- frequencyDF_all %>% dplyr::select(perLiter, pos_depth)
library(Hmisc)
rcorr(as.matrix(df_for_corr), type = "spearman")
## perLiter pos_depth
## perLiter 1.00 -0.55
## pos_depth -0.55 1.00
##
## n
## perLiter pos_depth
## perLiter 356 356
## pos_depth 356 358
##
## P
## perLiter pos_depth
## perLiter 0
## pos_depth 0
Cell concentration is significantly negatively correlated with depth
frequencyDF_surf <- frequencyDF_all %>% dplyr::filter(pos_depth < 50) %>% dplyr::filter(!is.na(perLiter)) %>% dplyr::group_by(Station)%>% dplyr::summarize(Conc= mean(perLiter))
frequencyDF_surf$Depth<- "Surface"
frequencyDF_SCM <- frequencyDF_all %>% dplyr::filter(pos_depth > 50 & pos_depth < 150) %>% dplyr::filter(!is.na(perLiter)) %>% dplyr::group_by(Station) %>% dplyr::summarize(Conc= mean(perLiter))
frequencyDF_SCM$Depth <-"SCM"
frequencyDF_Mid <- frequencyDF_all %>% dplyr::filter(pos_depth > 150 & pos_depth < 700) %>% dplyr::filter(!is.na(perLiter))%>% dplyr::group_by(Station) %>% dplyr::summarize(Conc= mean(perLiter))
frequencyDF_Mid$Depth <- "Mid"
frequencyDF_Bot<- frequencyDF_all%>% dplyr::filter(pos_depth > 700) %>% dplyr::filter(!is.na(perLiter)) %>% dplyr::group_by(Station) %>% dplyr::summarize(Conc= mean(perLiter))
frequencyDF_Bot$Depth <- "Bottom"
frequencyBYlayer<-rbind(frequencyDF_surf, frequencyDF_SCM, frequencyDF_Mid, frequencyDF_Bot)
mergedps merged replicates so there is one sample per combination of variables
psD4 = tax_glom(mergedps, "D4")
psD4ra <- transform_sample_counts(psD4, function(OTU) 100* OTU/sum(OTU))
psD4ra_Aonly<- subset_taxa(psD4ra, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida"))
psD4ra_Aonly_10<-subset_samples(psD4ra_Aonly, filter == "10")
psD4ra_Aonly_10_otus <-data.frame(otu_table(psD4ra_Aonly_10))
psD4ra_Aonly_10_otus$acanth <- rowSums(psD4ra_Aonly_10_otus)
metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]
metatable_percent_plot <- metatable %>% distinct(desc, .keep_all = TRUE) %>% filter(filter== "10")
row.names(metatable_percent_plot) <- metatable_percent_plot$desc
psD4ra_Aonly_10_otus_meta <- merge(psD4ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")
DPIstations <- psD4ra_Aonly_10_otus_meta %>% dplyr::select(acanth, Station, Depth)
DPIstations$Station <- paste0('st.', DPIstations$Station)
final <- merge(DPIstations, frequencyBYlayer)
plot count by percent
final2 <- final %>% filter(Conc < 2 & acanth < 5) # remove two outliers
countBYperc2 <- ggplot(final2, aes(x=Conc, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=19)) + xlab("Imaged acantharian cells (per L)") +ylab("% Chaunacanthida-Arthracanthida-Symphyacanthida Sequences") +
geom_smooth(method = lm, color = Cv[5]) +theme( axis.title.y = element_text(size = 17)) + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
countBYperc2
## `geom_smooth()` using formula 'y ~ x'
ggsave("countbypercent_noO.png", width = 8, height = 8)
## `geom_smooth()` using formula 'y ~ x'
concByperc <- lm(acanth ~ Conc, data = final2)
par(mfrow = c(2, 2))
plot(concByperc )
Residuals are normally distributed
# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(concByperc )
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.9092953, Df = 1, p = 0.3403
Homoscedastic!
summary(concByperc)
##
## Call:
## lm(formula = acanth ~ Conc, data = final2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8867 -0.5956 -0.3601 0.1804 3.3026
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0217 0.3639 2.808 0.0158 *
## Conc 1.5803 0.6186 2.555 0.0252 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.114 on 12 degrees of freedom
## Multiple R-squared: 0.3523, Adjusted R-squared: 0.2983
## F-statistic: 6.527 on 1 and 12 DF, p-value: 0.02524
Data are not normally distributed:
shapiro.test(final2$acanth)
##
## Shapiro-Wilk normality test
##
## data: final2$acanth
## W = 0.85758, p-value = 0.0282
ggqqplot(final2$acanth)
shapiro.test(final2$Conc)
##
## Shapiro-Wilk normality test
##
## data: final2$Conc
## W = 0.69015, p-value = 0.0002927
ggqqplot(final2$Conc)
Spearman Correlation:
df_for_corr <- final2 %>% dplyr::select(acanth, Conc)
library(Hmisc)
rcorr(as.matrix(df_for_corr), type = "spearman")
## acanth Conc
## acanth 1.00 0.75
## Conc 0.75 1.00
##
## n= 14
##
##
## P
## acanth Conc
## acanth 0.002
## Conc 0.002
cropped all acantharian photos to touch the edges of the acantharian spines. Then got image size in pixels with: file ./* > imagesize.csv
entire image is 2448 x 2050 pixels 5.5 cm accross (55,000 µm) 55,000 / 2448 = 22.5
Station 17
Load and format data:
imagesize17 <- read.csv("~/Desktop/MiraiDPI/st17DPI/st17imagesize.csv", header = FALSE)
imagesize17$V1 <- substring(imagesize17$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize17$V2), " "))
imagesize17$height <- as.numeric(split[,2])
imagesize17$width <- as.numeric(split[,4])
imagesize17<- imagesize17[,c(1,5,6)]
names(imagesize17) <- c("photos", "height", "width")
imagesizedepth17<- merge(st17aCTD,imagesize17, by = "photos" )
imagesizedepth17$height <- (imagesizedepth17$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth17$width <- (imagesizedepth17$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth17$size <- imagesizedepth17$height * imagesizedepth17$width
imagesizedepth17$pos_depth <- imagesizedepth17$Depth * -1
imagesizedepth17$Station <- "st.17"
str(imagesizedepth17)
## 'data.frame': 237 obs. of 7 variables:
## $ photos : chr "20170609085210" "20170609085215" "20170609085215" "20170609085220" ...
## $ Depth : num -12.6 -13.2 -13.2 -13.1 -13.4 ...
## $ height : num 0.54 0.562 0.652 0.652 0.517 ...
## $ width : num 0.585 0.585 0.608 0.608 0.472 ...
## $ size : num 0.316 0.329 0.396 0.396 0.245 ...
## $ pos_depth: num 12.6 13.2 13.2 13.1 13.4 ...
## $ Station : chr "st.17" "st.17" "st.17" "st.17" ...
Station 3
Load and format data:
imagesize3 <- read.csv("~/Desktop/MiraiDPI/st3DPI/st3imagesize.csv", header = FALSE)
imagesize3$V1 <- substring(imagesize3$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize3$V2), " "))
imagesize3$height <- as.numeric(split[,2])
imagesize3$width <- as.numeric(split[,4])
imagesize3<- imagesize3[,c(1,5,6)]
names(imagesize3) <- c("photos", "height", "width")
imagesizedepth3<- merge(st3aCTD,imagesize3, by = "photos" )
imagesizedepth3$height <- (imagesizedepth3$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth3$width <- (imagesizedepth3$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth3$size <- imagesizedepth3$height * imagesizedepth3$width
imagesizedepth3$pos_depth <- imagesizedepth3$Depth * -1
imagesizedepth3$Station <- "st.3"
str(imagesizedepth3)
## 'data.frame': 115 obs. of 7 variables:
## $ photos : chr "20170531074321" "20170531074322" "20170531074340" "20170531074342" ...
## $ Depth : num -7.21 -7.22 -7.09 -7.24 -7.24 ...
## $ height : num 1.035 0.765 0.45 0.765 0.45 ...
## $ width : num 0.922 0.36 0.472 0.652 0.472 ...
## $ size : num 0.955 0.275 0.213 0.499 0.213 ...
## $ pos_depth: num 7.21 7.22 7.09 7.24 7.24 ...
## $ Station : chr "st.3" "st.3" "st.3" "st.3" ...
Station 10
Load and format data:
imagesize10 <- read.csv("~/Desktop/MiraiDPI/st10DPI/st10imagesize.csv", header = FALSE)
imagesize10$V1 <- substring(imagesize10$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize10$V2), " "))
imagesize10$height <- as.numeric(split[,2])
imagesize10$width <- as.numeric(split[,4])
imagesize10<- imagesize10[,c(1,5,6)]
names(imagesize10) <- c("photos", "height", "width")
imagesizedepth10<- merge(st10aCTD,imagesize10, by = "photos" )
imagesizedepth10$height <- (imagesizedepth10$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth10$width <- (imagesizedepth10$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth10$size <- imagesizedepth10$height * imagesizedepth10$width
imagesizedepth10$pos_depth <- imagesizedepth10$Depth * -1
imagesizedepth10$Station <- "st.10"
str(imagesizedepth10)
## 'data.frame': 359 obs. of 7 variables:
## $ photos : chr "20170602061002" "20170602061004" "20170602061007" "20170602061008" ...
## $ Depth : num -18.9 -19.4 -18.8 -18.5 -18.5 ...
## $ height : num 1.103 0.315 0.495 1.058 0.765 ...
## $ width : num 1.08 0.405 1.035 0.945 0.81 ...
## $ size : num 1.191 0.128 0.512 0.999 0.62 ...
## $ pos_depth: num 18.9 19.4 18.8 18.5 18.5 ...
## $ Station : chr "st.10" "st.10" "st.10" "st.10" ...
Station 15
Load and format data:
imagesize15 <- read.csv("~/Desktop/MiraiDPI/st15DPI/st15imagesize.csv", header = FALSE)
imagesize15$V1 <- substring(imagesize15$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize15$V2), " "))
imagesize15$height <- as.numeric(split[,2])
imagesize15$width <- as.numeric(split[,4])
imagesize15<- imagesize15[,c(1,5,6)]
names(imagesize15) <- c("photos", "height", "width")
imagesizedepth15<- merge(st15aCTD,imagesize15, by = "photos" )
imagesizedepth15$height <- (imagesizedepth15$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth15$width <- (imagesizedepth15$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth15$size <- imagesizedepth15$height * imagesizedepth15$width
imagesizedepth15$pos_depth <- imagesizedepth15$Depth * -1
imagesizedepth15$Station <- "st.15"
str(imagesizedepth15)
## 'data.frame': 524 obs. of 7 variables:
## $ photos : chr "20170606150930" "20170606150930" "20170606150931" "20170606150933" ...
## $ Depth : num -12.2 -12.2 -11.8 -11.7 -11.7 ...
## $ height : num 1.282 0.9 0.652 1.597 1.89 ...
## $ width : num 1.215 0.585 0.63 0.81 1.823 ...
## $ size : num 1.558 0.526 0.411 1.294 3.445 ...
## $ pos_depth: num 12.2 12.2 11.8 11.7 11.7 ...
## $ Station : chr "st.15" "st.15" "st.15" "st.15" ...
imagesizedepth_all <- rbind(imagesizedepth10, imagesizedepth15,imagesizedepth17, imagesizedepth3)
imagesizedepth_all$Station <- factor(imagesizedepth_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))
depthsize_all<- ggplot(imagesizedepth_all, aes(x=pos_depth, y = size)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("") +ylab(expression(Acantharian~ROI~area~(mm^2)))+ theme(axis.text.y = element_text(angle = 90, hjust = 1))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
depthsize_all
p <-imagesizedepth_all %>% dplyr::select(pos_depth, size) %>%
# Add a new column called 'bin': cut the initial 'carat' in bins
mutate( bin=cut_width(pos_depth, width=10, boundary=5) ) %>%
# plot
ggplot( aes(x=bin, y=size) ) +
geom_boxplot(fill="#69b3a2", coef=100) +
xlab("Depth") + theme_bw()+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
p
but X axis is not continuous - bins with without any data are skipped
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:Hmisc':
##
## is.discrete, summarize
## The following object is masked from 'package:ggpubr':
##
## mutate
## The following objects are masked from 'package:reshape':
##
## rename, round_any
## The following object is masked from 'package:matrixStats':
##
## count
## The following object is masked from 'package:IRanges':
##
## desc
## The following object is masked from 'package:S4Vectors':
##
## rename
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
sizemediandf <- imagesizedepth_all %>% dplyr::select(pos_depth, size)
cutdataframe <- ddply(sizemediandf, .(cut(sizemediandf$pos_depth, 100)), colwise(mean))
cut_max <- ddply(sizemediandf, .(cut(sizemediandf$pos_depth, 100)), colwise(max))
cut_min <- ddply(sizemediandf, .(cut(sizemediandf$pos_depth, 100)), colwise(min))
names(cutdataframe) <- c("Depth", "meanSize")
names(cut_max)<- c("Depth", "maxSize")
names(cut_min) <- c("Depth", "minSize")
cut2<- merge(cutdataframe, cut_max, by = "Depth")
cut3<- merge(cut2, cut_min, by = "Depth")
cut3$depth1 <- substr(as.character(cut3$Depth), 2, 6)
cut3$depth1<- as.numeric(sapply(strsplit(cut3$depth1, ","), "[", 1))
Are linear regression assumptions met?
meanSizelm<- lm(meanSize ~ depth1, data = cut3)
par(mfrow = c(2, 2))
plot(meanSizelm)
Residuals are close to normally distributed (only 3 points far from straight line), looks homoscedastic
# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(meanSizelm)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 2.277725, Df = 1, p = 0.13124
it is homoscedastic
according to central limit theorem, probably could use a linear regression because sample size is large (82 > 30)
for consistency, will use loess and correlation coefficient:
depthsize_all<- ggplot(cut3, aes(x=depth1, y = meanSize)) + geom_point() + theme_bw() +theme(text = element_text(size=14)) + xlab("") +ylab(expression(Acantharian~ROI~area~(mm^2))) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_pointrange(aes(ymin= minSize , ymax=maxSize), alpha = 0.3)+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.text.y = element_text(angle = 90, hjust = 1)) + geom_smooth(span =1 ,color = "#0E899F") + scale_y_continuous(expand = c(0, 0))
depthsize_all
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#ggsave("size_by_depth.png", width =14, height = 7)
Correlation Coefficient
is data normally distributed?
ggqqplot(cut3$meanSize)
shapiro.test(cut3$meanSize)
##
## Shapiro-Wilk normality test
##
## data: cut3$meanSize
## W = 0.48787, p-value = 2e-15
NO …
–> use Spearman Correlation Coefficient
df_for_corr <- cut3 %>% dplyr::select(meanSize, depth1)
rcorr(as.matrix(df_for_corr), type = "spearman")
## meanSize depth1
## meanSize 1.00 -0.55
## depth1 -0.55 1.00
##
## n= 82
##
##
## P
## meanSize depth1
## meanSize 0
## depth1 0
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] plyr_1.8.5 reshape2_1.4.3
## [3] CoDaSeq_0.99.2 car_3.0-5
## [5] carData_3.0-3 zCompositions_1.3.3-1
## [7] truncnorm_1.0-8 NADA_1.6-1
## [9] MASS_7.3-51.5 ALDEx2_1.14.1
## [11] Hmisc_4.3-0 Formula_1.2-3
## [13] survival_3.1-8 ggpubr_0.2.4
## [15] magrittr_1.5 gridExtra_2.3
## [17] wesanderson_0.3.6.9000 reshape_0.8.8
## [19] DESeq2_1.22.2 SummarizedExperiment_1.12.0
## [21] DelayedArray_0.8.0 BiocParallel_1.16.6
## [23] matrixStats_0.55.0 Biobase_2.42.0
## [25] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
## [27] IRanges_2.16.0 S4Vectors_0.20.1
## [29] BiocGenerics_0.28.0 RColorBrewer_1.1-2
## [31] vegan_2.5-6 lattice_0.20-38
## [33] permute_0.9-5 phyloseq_1.26.1
## [35] qiime2R_0.99.1 forcats_0.4.0
## [37] stringr_1.4.0 dplyr_0.8.3
## [39] purrr_0.3.3 readr_1.3.1
## [41] tidyr_1.0.0 tibble_3.0.1
## [43] ggplot2_3.3.0 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.1.5 igraph_1.2.4.2
## [4] splines_3.5.0 digest_0.6.23 foreach_1.4.7
## [7] htmltools_0.4.0 fansi_0.4.0 checkmate_1.9.4
## [10] memoise_1.1.0 cluster_2.1.0 openxlsx_4.1.4
## [13] Biostrings_2.50.2 annotate_1.60.1 modelr_0.1.5
## [16] colorspace_1.4-1 blob_1.2.0 rvest_0.3.5
## [19] haven_2.2.0 xfun_0.11 crayon_1.3.4
## [22] RCurl_1.95-4.12 jsonlite_1.6 genefilter_1.64.0
## [25] iterators_1.0.12 ape_5.3 glue_1.3.1
## [28] gtable_0.3.0 zlibbioc_1.28.0 XVector_0.22.0
## [31] Rhdf5lib_1.4.3 abind_1.4-5 scales_1.1.0
## [34] DBI_1.0.0 Rcpp_1.0.3 xtable_1.8-4
## [37] htmlTable_1.13.3 foreign_0.8-72 bit_1.1-14
## [40] htmlwidgets_1.5.1 httr_1.4.1 acepack_1.4.1
## [43] ellipsis_0.3.0 farver_2.0.1 pkgconfig_2.0.3
## [46] XML_3.98-1.20 nnet_7.3-12 dbplyr_1.4.2
## [49] utf8_1.1.4 locfit_1.5-9.1 labeling_0.3
## [52] tidyselect_0.2.5 rlang_0.4.5 AnnotationDbi_1.44.0
## [55] munsell_0.5.0 cellranger_1.1.0 tools_3.5.0
## [58] cli_2.0.0 generics_0.0.2 RSQLite_2.1.4
## [61] ade4_1.7-13 broom_0.5.6 evaluate_0.14
## [64] biomformat_1.10.1 yaml_2.2.0 knitr_1.26
## [67] bit64_0.9-7 fs_1.3.1 zip_2.0.4
## [70] nlme_3.1-140 xml2_1.2.2 compiler_3.5.0
## [73] rstudioapi_0.10 curl_4.3 ggsignif_0.6.0
## [76] reprex_0.3.0 geneplotter_1.60.0 stringi_1.4.3
## [79] Matrix_1.2-18 multtest_2.38.0 vctrs_0.2.4
## [82] pillar_1.4.3 lifecycle_0.2.0 cowplot_1.0.0
## [85] data.table_1.12.8 bitops_1.0-6 R6_2.4.1
## [88] latticeExtra_0.6-28 rio_0.5.16 codetools_0.2-16
## [91] assertthat_0.2.1 rhdf5_2.26.2 withr_2.1.2
## [94] GenomeInfoDbData_1.2.0 mgcv_1.8-31 hms_0.5.2
## [97] grid_3.5.0 rpart_4.1-15 rmarkdown_1.18
## [100] lubridate_1.7.4 base64enc_0.1-3